Convolutional Neural Network
Extract, composite, and match simple shapes.
Model
$$ \begin{aligned} I_n &\to \boxed{\phi_1, \dots, \phi_K} \to M_n = f(I_n; \phi_1, \dots, \phi_K) \\ &\to \boxed{\psi_1, \dots, \psi_K} \to L_n = f(M_n; \psi_1, \dots, \psi_K) \\ &\to \boxed{\omega_1, \dots, \omega_K} \to G_n = f(L_n; \omega_1, \dots, \omega_K) \\ &\to \boxed{W} \to p_n = g(G_n; W) \\ \end{aligned} $$Risk Function
$$ \begin{aligned} \mathcal{L}(\Phi, \Psi, \Omega, W) &= \frac{1}{N} \sum_{n=1}^{N} \ell(y_n, p_n) \end{aligned} $$Parameters
$$ \begin{aligned} (W, H, C_{\text{in}}, C_{\text{out}}) \end{aligned} $$- $W, H$: Input Image Width & Height
- $ C_{\text{in}}, C_{\text{out}} $: Input Channel Size, Output Channel Size
- Kernel Size: $ k $ ; Padding Size: $ p $ ; Stide: $ s $
Output Size
"A guide to convolution arithmetic for deep learning"
(The "$ 1 $" is for the starting one. $ \lfloor \rfloor $ is used, since the one strided out doesn't count.) $$ \begin{aligned} o &= 1 + \left\lfloor \frac{i + 2p - k}{s} \right\rfloor &\text{[for convolution]} \\ &= 1 + \left\lfloor \frac{i - k}{s} \right\rfloor &\text{[for pooling]} \end{aligned} $$
- $ o $ : Output Size
- $ i $ : Input Size
- $ k $ : Kernel Size
- $ s $ : Stride Size
- $ p $ : Padding Size